Segmentation of Horizontally Overlapping Lines in Printed Indian Scripts
نویسندگان
چکیده
Horizontally overlapping lines are normally found in printed newspapers of any Indian script. Along with these overlapping lines few other broken components of a line (strip) having text less than a complete line are also found in text. The horizontally overlapping lines and other strips make it very difficult to estimate the boundary of a line leading to incorrect line segmentation. Incorrect line segmentation decreases the recognition accuracy. In this paper we have proposed a solution for segmenting horizontally overlapping lines and solved the problem of other strips in eight most widely used printed Indian scripts. Whole document has been divided into strips and proposed algorithm has been applied for segmenting horizontally overlapping lines and associating small strips to their respective lines. The algorithm has shown approximately 96.45-99.79% accuracy depending upon script. We have also tried to segment horizontally overlapping lines, containing different sized text, i.e. the newspaper articles in which bigger sized heading lines overlaps with normal sized text lines.
منابع مشابه
On Segmentation of Touching Characters and Overlapping Lines in Degraded Printed Gurmukhi Script
Character segmentation plays a very important role in a text recognition system. The simple technique of using inter-character gap for segmentation is useful for fine printed documents, but this technique fails to give satisfactory results if the input text contains touching characters. In this paper, we have proposed two algorithms to segment touching characters, and one algorithm to segment o...
متن کاملA Complete Machine printed Gurmukhi OCR System
Recognition of Indian language scripts is a challenging problem. Work for the development of complete OCR systems for Indian language scripts is still in infancy. Complete OCR systems have recently been developed for Devanagri and Bangla scripts. Research in the field of recognition of Gurmukhi script faces major problems mainly related to the unique characteristics of the script like connectiv...
متن کاملSegmentation Problems and Solutions in Printed Degraded Gurmukhi Script
Character segmentation is an important preprocessing step for text recognition. In degraded documents, existence of touching characters decreases recognition rate drastically, for any optical character recognition (OCR) system. In this paper we have proposed a complete solution for segmenting touching characters in all the three zones of printed Gurmukhi script. A study of touching Gurmukhi cha...
متن کاملAn Efficient OCR for Printed Malayalam Text using Novel Segmentation Algorithm and SVM Classifiers
This paper describes an Optical Character Recognition (OCR) System for printed text documents in Malayalam, a South Indian language. Indian scripts are rich in patterns while the combinations of such patterns makes the problem even more complex and these complex patterns are exploited to arrive at the solution. The system segments the scanned document image into text lines, words and further ch...
متن کاملOCR for printed Kannada text to Machine editable format using Database approach
This paper describes an Optical Character Recognition (OCR) system for printed text documents in Kannada, a South Indian language. The proposed OCR system for the recognition of printed Kannada text, which can handle all types of Kannada characters. The system first extracts image of Kannada scripts, then from the image to line segmentation then segments the words into sub-character level piece...
متن کامل